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TRACKING  ON  INTENSITY-MODULATED  DATA  STREAMS 


1.  INTRODUCTION 

The  theoretical  methodology  of  probabilistic  multi-hypothesis  tracking  (PMHT)  [1]  is 
extended  to  tracking  one  or  more  targets  moving  against  a  noisy  background  in  an  intensity- 
modulated,  nonstationary  data  stream.  The  algorithm  is  called  herein  the  histogram-PMHT 
algorithm  because  its  derivation  depends  in  a  crucial  way  on  a  synthetic  histogram  interpre¬ 
tation  of  the  measured  data.  The  goal  is  to  derive  a  multi-target  tracking  algorithm  that 
utilizes  all  the  data  available  to  the  sensor  display,  thereby  completely  avoiding  the  current 
widespread  practice  of  thresholding  the  sensor  data  (e.g.,  peak  picking)  to  generate  point 
“measurements”  that  are  subsequently  fed  to  a  tracker.  Preprocessing  the  sensor  data  in 
this  way  results  in  a  loss  of  information  that  may  significantly  increase  the  number  of  lost 
tracks  and  the  tracking  error  in  some  applications.  The  fundamental  premise  of  this  report 
is  that  thresholding  loss  can  be  eliminated  if  the  entire  sensor  output  data  set  is  properly 
utilized  by  the  tracking  algorithm.  The  histogram-PMHT  algorithm  derived  in  this  report 
uses  all  the  sensor  output  data. 

Nonstationary,  intensity-modulated  data  streams  arise  in  many  applications.  A  common¬ 
place  one-dimensional  application  is  a  waterfall  (or  similar)  display  of  short-term,  magnitude- 
squared,  Fourier  transform  data  versus  frequency.  For  passive  directional  arrays,  another 
one-dimensional  application  is  a  waterfall  display  of  received  broadband  power  as  a  function 
of  bearing.  A  two-dimensional  application  to  directional  arrays  is  the  sequence  of  intensity- 
modulated  bearing-frequency  images  that  arises  if  short-term  spectra  are  computed  on  the 
arrays’  directional  beam  outputs. 

The  development  in  this  report  is  general  in  that  sensor  data  of  any  dimensionality  are 
treated,  and  the  general  version  of  PMHT  [1]  is  utilized.  It  is  shown  that  the  histogram- 
PMHT  algorithm  is  an  iteratively  reweighted  Kalman  filter  for  traditional  single-scan  up¬ 
dates.  For  batch  data,  the  histogram-PMHT  algorithm  is  an  iteratively  reweighted  Kalman 
smoothing  filter.  Because  histogram-PMHT  uses  a  richer  data  set  than  PMHT,  it  requires 
more  overhead  (i.e.,  computer  resources)  than  PMHT  to  compute  the  iteratively  re-estimated 
synthetic  measurements  used  in  the  algorithm;  however,  the  computational  complexity  of 
the  histogram-PMHT  algorithm  is  otherwise  equivalent  to  that  of  the  PMHT  algorithm. 

An  important  theoretical  aspect  of  the  work  is  the  unorthodox  way  in  which  a  Bayesian 
a  priori  density  on  target  state  is  added  to  the  model.  The  usual  approach  fails  in  this 
problem  because  an  intermediate  measurement  quantization  procedure  creates  an  infinite 
amount  of  synthetic  histogram  data  (in  the  limit  as  the  quantization  level  goes  to  zero), 
and  this  quantity  of  data  overwhelms  the  usual  prior  density.  The  “trick”  used  here  is  to 
apply  an  a  priori  density  to  each  point  in  the  synthetically  generated  histogram,  so  that 
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the  overall  a  priori  density  is  properly  “balanced”  against  the  histogram  data  at  every 
level  of  quantization.  Using  this  data-dependent  Bayesian  density,  the  quantization  step 
can  be  completely  eliminated  via  a  limiting  argument.  Consequently,  the  histogram-PMHT 
algorithm  uses  intensity-modulated  sensor  data  and  not  synthetic  histogram  data. 

Luginbuhl  was  the  first  to  suggest  the  joint  application  of  histogram  and  mixture  modeling 
methodologies  to  intensity-modulated  displays.  In  his  Ph.  D.  thesis  [2],  he  applied  them 
to  the  problem  of  estimating  a  general,  discrete-time,  frequency-modulated  process  using  a 
one-dimensional  waterfall  display  of  unaveraged,  short-term,  Fourier  transform  data.  His 
methodology  draws  on  several  sources  of  earlier  work.  The  fundamental  idea  of  applying 
the  method  of  expectation-maximization  (EM)  to  histogram  data  is  due  to  Dempster,  Laird, 
and  Rubin  in  their  original  EM  paper  [3].  Dempster  and  Rubin  [4]  used  the  EM  method  to 
derive  Shepard’s  corrections  for  one-dimensional  histogram  data.  McLachlan  and  Jones  [5] 
and  Jones  and  McLachlan  [6,7]  were  the  first  to  apply  the  EM  method  to  estimate  Gaussian 
mixtures  from  one-dimensional  histogram  data.  Among  these  references  [2-7],  only  Luginbuhl 
considered  the  important  problem  of  linking  successive  histograms  using  a  dynamical  model. 

This  report  is  organized  around  the  discussion  of  the  fixed  batch  length  problem,  that  is, 
when  all  the  data  from  the  initial  to  the  current  scan  are  processed  jointly  by  the  histogram- 
PMHT  algorithm.  The  recursive  problem,  in  which  the  batch  is  a  fixed  length  time  window 
that  slides  over  a  data  stream,  is  discussed  only  after  the  intricate  theoretical  details  are 
ironed  out  for  the  fixed  batch  problem.  Throughout  the  report  specific  parameterizations 
are  not  introduced  until  they  are  needed.  This  approach  not  only  minimizes  the  rather 
extensive  notational  difficulties  that  are  common  to  algorithms  developed  by  the  method  of 
EM,  but  it  also  clarifies  the  theoretical  development. 

Section  2  formulates  the  incomplete  data  likelihood  function  using  the  synthetic  his¬ 
togram  point  of  view.  Section  3  formulates  the  complete  data  likelihood  function  using  three 
progressive  stages  of  missing  data.  Section  4  evaluates  the  E-step,  that  is,  the  conditional 
expectation  of  the  logarithm  of  the  complete  data  likelihood  function  with  respect  to  the 
missing  data.  This  section  discusses  the  data-dependent  a  priori  density,  and  it  also  shows 
how  to  eliminate  the  synthetic  histogram  by  taking  the  limit  as  the  quantization  level  goes 
to  zero.  Section  5  gives  the  auxiliary  function  in  a  separable  form  suited  to  the  multi-target 
application.  Section  6  evaluates  the  M-step;  that,  is,  it  solves  for  parameters  that  maximize 
the  auxiliary  function  evaluated  in  the  E-step.  Section  7  is  especially  important  because  it 
states  the  histogram-PMHT  algorithm  for  the  fixed  and  recursive  batch  problems  in  a  form 
suitable  for  implementation.  Section  8  discusses  modifications  that  can  be  incorporated  into 
the  algorithm  and  that  may  have  utility  for  some  applications.  A  summary  is  given  in  section 
9. 
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2.  INCOMPLETE  DATA  LIKELIHOOD  FUNCTION 


Let  C  =  {Ci,  ...,Cs},  S  >  1,  denote  the  collection  of  all  possible  sensor  cells.  It  is 
assumed  that  Ci  n  Cj  =  4>  for  all  i  and  j  and  that  Ci  U  ...  U  Cs  =  Rdim^c\  where  dim(C) 
denotes  the  dimension  of  the  sensor  space.  The  shape  of  a  sensor  cell  may  be  very  general; 
in  fact,  it  need  not  even  be  connected  (e.g.,  a  cell  may  be  the  union  of  disjoint  subintervals). 
The  cells  C  are  intrinsically  fixed;  however,  those  cells  in  which  measurements  are  collected 
and  displayed  may  vary  from  scan  to  scan.  The  sensor  display  at  time  t  is  denoted  B(t )  = 
{Bi(t), C  C,  where  1  <  L(t)  <  S.  The  other  measurement  cells  Bc(t)  = 
{I?£(t)+i(f), ...,  Bs{t)}  =  C  \  Bit)  are  not  displayed  and  are  said  to  be  truncated.  It  is 
assumed  that  no  measurement  data  are  collected  for  cells  in  Bc(t). 

Let  T  >  1  denote  the  number  of  scans  in  a  batch  of  measurements.  The  usual  formulation 
for  a  recursive  filter  corresponds  to  the  special  case  T  =  1.  Denote  the  sensor  measurement 
vector  at  time  t  by 

Zt  =  {Ztl>  %t,L(t) } 5  ^  (l) 

where  ztt  is  the  magnitude-squared  output  of  the  sensor  at  time  t  in  the  displayed  cell 
Bi(t).  Two  kinds  of  variation  must  be  modeled,  one  systematic  and  the  other  statistical. 
The  systematic  cell-to-cell  variation  of  the  measurement  vector  Zt  is  parameterized  using 
the  PMHT  multi-target  model;  that  is,  the  unknown  cell-to-cell  variation  is  assumed  to 
be  proportional  to  a  mixture  density.  In  the  PMHT  approach,  a  target  is  either  a  single 
component  in  the  mixture  or  a  group  of  appropriately  coupled  components.  A  model  of  the 
statistical  variation,  or  probability  density  function  (PDF),  of  the  individual  measurement  zu 
must  also  be  provided.  The  procedure  adopted  here  is  to  quantize  the  data  vector  Zt  into  a 
“pseudo-histogram,”  and  then  use  a  multinomial  distribution  to  model  the  cell  counts  in  the 
histogram.  The  expected  cell  counts  are  derived  via  the  PMHT  target  mixture  model.  The 
appropriateness  of  these  models  must  be  verified  on  a  case-by-case  basis  in  the  application. 

Let  h2  >  0  be  a  specified  quantization  level,  and  let 

Nt  =  {ntl,...,nt,L(t)},  f  =  l,...,T,  (2) 

denote  the  quantized  vector  corresponding  to  Zt,  where 


(3) 


and  |zj  denotes  the  greatest  integer  less  than  or  equal  to  x.  The  use  of  the  quantized  vector 
Nt  instead  of  the  measurement  vector  Zt  is  an  intermediate  step  in  the  development.  In 
the  sequel,  after  deriving  the  auxiliary  function  of  the  histogram-PMHT  algorithm  using  the 
quantized  vectors  Nt,  the  measurement  vectors  Zt  will  be  recovered  in  the  limit  as  h2  — >•  0. 
Let 

m 

Ntx  =  (4) 

£=1 
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denote  the  total  count,  or  sample  size,  at  time  t. 


It  is  assumed  that  the  vector  Nt  has  a  multinomial  distribution  consisting  of  Nt%  inde¬ 
pendent  draws  (with  replacement)  on  L(t)  “categories”  with  probabilities 


Pi{Xt) 
P(Xt)  ’ 


(5) 


where,  for  all  cells, 


Pe(Xt)=[  f(r-,Xt)dr, 


£  =  1, 


(6) 


and 


m 

P(X,)  =  Y,  Pl(Xt), 


tel 


(7) 


where  f{r\Xt)  denotes  a  “sample”  PDF  defined  over  all  r  6  j^dini(C') ,  where  the  vector 
Xt  denotes  the  parameter  vector  of  the  sample  PDF  at  time  t.  At  the  moment,  Xt  is  not  a 
random  variable.  In  the  sequel,  / (r ;  Xt)  will  be  taken  to  be  a  Gaussian  mixture  in  which, 
just  as  in  PMHT,  the  mixture  components  correspond  to  targets. 

The  assumption  of  a  multinomial  density  for  the  quantized  data  vector  Nt  is  a  nontrivial 
assumption.  It  is  equivalent  to  the  statement  that  the  sample  counts  Nt  =  {nfl, ..., 
form  a  histogram  with  cells  Bi(t), ...,  B^) (t)  with  a  sample  size  of  Ntz,  where  the  samples  are 
independent  and  identically  distributed  (IID)  with  the  PDF  f(r;Xt)/P(Xt).  Because  Nt  is 
generated  via  a  synthetically  imposed  quantization  procedure,  the  samples  are  not  IID.  The 
histogram  model  assumes,  therefore,  that  more  information  is  available  than  the  measured 
data  Zt  can  possibly  possess.  The  synthetic  histogram  data  also  cause  difficulty  when  a 
Bayesian  model  for  the  parameters  Xt  is  adopted  because  the  abundance  of  synthetically 
generated,  but  supposedly  IID,  samples  overwhelms  the  usual  prior  density.  This  mismatch 
is  solved  (see  section  3.3  below)  by  choosing  a  sufficiently  uninformative  prior  density  to 
compensate  for  using  a  too  informative  likelihood  function  for  the  synthetic  histogram  data 
(i.e.,  the  multinomial  PDF). 

Let  N  =  {Ni, ....  Nt}  denote  the  collection  of  quantized  measurement  vectors,  and  let 
X  =  {Xi,  ...,XT}.  Then,  assuming  that  the  vectors  making  up  N  are  independent,  the 
so-called  incomplete  data  PDF  of  N  is  given  by 

T 

Pinc(X',X)  ~  PJ  Pinc{Xt',  Xt),  (8) 

t=  1 
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where 


'TO1% 

PW. 


(9) 


Pinc{Nti  Xt) 


Ntz[ 

nt,  1!  •••  nt,L(t)]- 


n 


t= i 


The  incomplete  data  likelihood  function  of  X  is  obtained  from  (8)  by  substituting  an  appro¬ 
priate,  application-dependent,  parametric  form  for  the  sample  density  f(r;Xt)  into  (9). 

If  p=.(X)  denotes  the  a  priori  density  of  X,  then  the  incomplete  data  likelihood  function 
is  given  by 


Pinc(N,X)=pz(X)pinc(N\X ),  (10) 

where  the  density  pinc(jV \X)  is  essentially  identical  to  (8),  the  only  difference  being  its 
statistical  interpretation.  A  Bayesian  prior  density  for  the  parameter  X  is  developed  in  the 
next  section  (cf.  equation  (29)),  and  it  must  be  included  in  the  incomplete  data  density 
when  the  Bayesian  viewpoint  is  adopted. 
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3.  COMPLETE  DATA  LIKELIHOOD  FUNCTION 


3.1  UNOBSERVED  CELL  COUNTS  AS  MISSING  DATA 

Missing  data  are  introduced  in  three  progressive  stages,  with  the  Bayesian  prior  density 
on  the  parameters  X  being  introduced  at  the  end  of  the  second  stage.  A  suitable  version 
of  the  general  treatment  of  Dempster,  Laird,  and  Rubin  [3,  Section  4.2]  is  followed  initially; 
later,  after  mixture  models  are  incorporated  into  the  problem,  the  more  detailed  structure 
of  McLachlan  and  Jones  [5]  is  followed.  For  a  general,  up-to-date  discussion  of  the  method 
of  EM  and  its  variants,  see  McLachlan  and  Krishnan  [8]. 


In  the  first  stage,  missing  random  variables  are  used  to  model  the  counts  in  the  unob¬ 
served,  or  truncated,  cells  in  Bc(t).  For  l  =  L(t)  + 1, ...,  S,  let  nu  denote  the  missing  count  for 
cell  Bt(t).  It  is  assumed  that  the  missing  counts  are  distributed  as  a  negative  multinomial. 
(See  Johnson  et  al.  [9,  chapter  36]  for  a  general  description  of  the  negative  multinomial  and 
related  discrete  PDFs.)  Letting 

Nt  =  (H) 


and 


■Nffi  = 

t=L{t)+ 1 

the  negative  multinomial  PDF  on  IVtc  is  given  by 

P{N‘t\Nt-xt)=- — [p (^,r,c  n  [««)]"“■ 

nt,m+ 1!  ■  •  •  Ws  -  x)!  i=L(t)+i 


(12) 


(13) 


It  is  sufficient  in  (13)  to  condition  only  on  the  total  count  Ntz  instead  of  jVt;  however,  this 
notation  is  convenient  for  the  present  purpose.  Because  p(N£\Nt]  Xt)  is  a  PDF, 

OO  00 

.  Y  p(n;\n,;X,)=  Y  ■■■  E  («) 

»t,z;(t)+ 1=°  nt,s=0 

The  conditional  mean  value  of  the  missing  cell  count  ntt  is  given  by 

OO  00  OO 

J2nap(N‘\N,-,Xt)  =  Y,  -E'"E 

Nf  ^t,L(*)+ 1=0  nt£=0  nt,5=0 

=  NtY,  Jfy*}  i  L(t)  4- 1  <  t  <  S,  (15) 

an  intuitively  reasonable  result.  Expression  (15)  is  derived  by  substituting  (13),  canceling 
ntt  in  the  ntt\  term,  and  adjusting  the  remaining  parameters  in  a  straightforward  way  so 
that  definition  (13)  can  be  re-invoked. 
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A  model  of  the  negative  multinomial  distribution  in  terms  of  realizations  of  a  (stationary) 
Bernoulli  trial  sequence  may  provide  an  insightful  and  physically  meaningful  interpretation 
for  some  applications.  Each  trial  in  the  sequence  has  S  possible  outcomes,  one  correspond¬ 
ing  to  each  cell  in  C.  As  the  sequence  progresses,  a  running  count  of  the  total  number 
of  occurrences  of  each  outcome  is  recorded,  and  the  sequence  stops  when  a  total  of 
outcomes  is  obtained  in  the  set  of  observed  cells  ...,BL{t)(t)}.  Upon  stopping  the 

trial  sequence,  counts  N£  =  {n^L(t)+i,  •••,  nf,s}  have  been  obtained  in  the  unobserved  cells 
Bs(t).  The  probability  of  these  counts  is  given  by  (13). 

Let  Nc  =  {N£, ...,  Aj.}  denote  the  collection  of  missing  measurement  count  vectors.  Using 
independence  of  the  count  vectors  in  Nc  and  Bayes  Theorem  gives  the  complete  data  PDF 
at  the  end  of  the  first  stage  as 

&(N,N‘ ;X)  =  n 

t=l 

T 

=  n  Pinc(Nt;X,)p(N;\Nt;Xt). 

t~  1 

Substituting  (9)  and  (13),  using  definitions  (4)  and  (12),  and  simplifying  the  resulting  ex¬ 
pression  gives 


where 


P<i(iv,iv';Ar)=n  T>n  [pt(xt)}n“ , 

jM  x  Ws  +  Afo-1)! 

nt,i'  nt,L(t)^  nt,L{t)+i\  •••  ^t,s!  (Ats  —  1)! 


(16) 


(17) 


It  is  clear  from  (16)  that  the  negative  multinomial  cancels  denominator  terms  of  the  form 
P(Xt)  in  the  multinomial  PDF. 


3.2  SAMPLE  LOCATIONS  AS  MISSING  DATA 

In  the  second  stage,  missing  random  variables  are  used  to  represent  the  location  of  the 
unobserved  samples  in  all  S  cells.  There  are  ntt  samples  at  time  t  in  cell  Bi(t),  so  let 

Cte  =  (C«i)  •••)  Cuntc}  C  Bi(t)  (18) 

denote  the  locations  of  the  samples  within  cell  Bt(t).  The  random  variables  in  £«  are  assumed 
to  be  IID  with  PDF  f{z;  Xt)/Pe(Xt),  and  their  domain  is  restricted  to  Be(t).  Let 

Ct =  {Cti>  •••)  Cts}) 
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and 


C  =  {Ci>  — >  Ct}- 

The  complete  data  PDF  for  the  second  stage  is  defined  to  be 

T  S  nti 

n\  (;x)=n7,nn  /(&-  *<>•  («) 

t=l  £=1  r=l 

The  marginal  density  of  (19)  over  £  is  easily  seen  to  be  identical  to  (16),  so  definition  (19) 
is  compatible  with  the  existing  statistical  structure. 

3.3  DATA-DEPENDENT  BAYESIAN  MODEL  FOR  X 

Before  continuing  to  the  third  stage  of  adding  missing  data,  a  Bayesian  model  is  adopted 
for  the  parameter  vector  X.  Following  the  usual  approach,  let 

H  —  {Ho,  =i>  •••>  Hr} 

denote  the  sequence  of  multi-target  state  vectors,  or  random  variables,  and  let 

X  =  {X0,X1,...,XT} 

denote  a  realization  of  the  state  sequence  E.  The  additional  state  vector  Ho  is  used  to  model 
the  a  priori  variable  of  Hi.  Under  a  Markov  assumption  for  H,  Bayes  Theorem  gives 

T 

j>sW=ra,Wn  (20) 

t= 1 

where  p=0(A0)  is  the  density  of  H0.  Applied  to  the  complete  data  PDF  (19),  this  approach 
yields  the  joint  density 


£ 1(V  N,  N°,  C)  =  Ps(X)  p*-±(N,  Nc,  CIV).  (21) 

The  conditional  density  in  (21)  is  obtained  from  (19)  in  the  usual  way  by  interpreting  para¬ 
metric  dependence  as  Bayesian  conditioning. 

Formulation  (21)  has  the  surprising  consequence  that  the  a  priori  density  Ps(X)  has 
no  influence  on  the  parameter  estimates  as  JVts  — >  oo  or,  equivalently,  as  the  arbitrary 
quantization  scale  factor  h2  — >  0.  The  reason  is  that  data  counts  in  the  histogram  become 
infinite  as  Ti2  — ¥  0  and,  consequently,  the  synthetically  induced  abundance  of  data  overwhelms 
the  a  priori  density.  Since  the  information  in  the  original  sensor  measurement  vectors  Zt  is 
independent  of  the  degree  of  quantization,  the  traditional  approach  is  not  the  appropriate 
way  to  pose  a  Bayesian  model  for  the  complete  data  PDF  (19). 
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An  alternative  Bayesian  formulation  is  adopted  here.  In  this  approach,  the  Bayesian 
prior  density  is  applied  to  each  event  that  generates  a  count  in  a  cell.  The  pairs  in  the  set 

=  uf=i((£«r,  C Ur)  ■  r  =  1, ...,  nte}  (22) 

are  assumed  to  be  IID  samples  of  a  joint  PDF  conditioned  on  the  state  realization  Et_i  = 
Xt~i,  denoted  by  .PstTt|st-i('>  ‘l-^Ce — 1)»  where  T <  is  the  location  random  variable  with  sample 
value  Ctfr,  and  Et  is  the  state  random  variable  with  sample  value  Applying  Bayes 
Theorem  and  assuming  that  Tt  is  independent  of  Et_i  when  conditioned  on  Et ,  gives 

P=tTt|St_!  =  PEt\St-i  Prt\stst-i 

=  PEtiSt-i  Prt\St  •  (23) 

Conditional  independence  of  sample  pairs  in  Q,t  implies  that  the  likelihood  function  of  is, 
using  (23), 


S  nt£ 

i=  1  r= 1 
S  nti 

=  nil  PHt|2t_i(£tfr|A’j_1)  Prt\3t(Ct£r\€tir)-  (24) 

£=l  r= 1 

The  Bayesian  assumption,  as  it  is  applied  in  (24),  is  that  each  measurement  (^r  is  generated 
from  a  parameter  specific  to  it,  namely  thus,  the  total  number  of  parameters  {&#•} 
equals,  or  balances,  the  total  number  of  data  points. 

To  make  maximum  a  posteriori  (MAP)  estimation  tractable  and,  more  importantly, 
to  make  the  data-specific  parameters  {&&.}  consistent  with  the  parameter  sequence  X  = 
{Ao,  Xi, ...,  Xt}i  the  parameters  {£<&•}  are  constrained  so  that 

=  Xu  for  r  =  1, ...,  nU]  t  =  1, ...,  S.  (25) 

The  constraints  (25)  apply  to  realizations  of  the  Bayesian  random  variable  models  of  the 
parameters,  not  to  the  random  variables  themselves.  Substituting  the  constraints  (25)  into 
(24)  gives  the  likelihood  function  of  the  data  fij  in  the  form 

s  ntl 

bstlS*-!  (*«-!)]  ‘S+*‘2  HI!  PTt|S«(Ct«r|*t).  (26) 

£=1  r=l 

Finally,  the  Bayesian  version  of  (19)  used  here  is  given  by 

t  s’  nu 

Pcom(X,N,N  ,  £)  =  ps(X)  f{Ct£r\Xt),  (27) 

t=l  e=l  r=l 
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where  the  obvious  identification 


Prt|Et  (Cur\Xt)  =  f(Cur\Xt)  (28) 

has  been  substituted  into  (27),  and  where  the  data-dependent  prior  density  is 

MX)=Pa,(X „)  n  (29) 

t=l 

The  exponent  of  the  density  Pso(Xq)  in  (29)  is  unity,  given  the  absence  of  synthetic  histogram 

data  at  time  t  =  0.  It  will  be  seen  that  Pe0  (X0)  is  eliminated  in  the  limit  as  the  quantization 
scale  nr  -*•  0. 

The  alternative  density  (29)  has  as  many  a  priori  density  factors  per  scan  as  there  are 
location  samples  in  the  synthetic  histogram  of  that  scan,  in  contrast  to  the  usual  density 
(20) ,  which  has  only  one  such  factor  per  scan.  Hence,  the  data  cannot  overwhelm  the  a 
priori  density  (29)  as  the  quantization  scale  h2  ->  0. 

3.4  MIXTURE  COMPONENT  ASSIGNMENTS  AS  MISSING  DATA 

The  final  stage  of  missing  random  variables  is  required  by  the  particular  sample  PDF 
considered  in  this  application.  The  sample  PDF  is  a  function  of  location  in  the  sensor  output 
space  Rdim(c\  and  it  is  assumed  to  be  the  mixture  density 

M 

f(r\Xt)  =  7T tk  Gk{r\Xt),  (30) 

k= 0 

where  the  mixing  proportions  7itk  >  0  and 


KtO  +  TTtl  +  •••  +  7 T-tM  =  1, 

and  where  Gk{r]Xt)  is  a  PDF  for  all  h,  i.e.,  it  is  nonnegative  and  its  integral  over  r  is  equal 
to  1  for  all  Xt.  Specific  parametric  forms  and  different  subsets  of  parameters  in  Xt  will 
eventually  be  identified  for  each  of  the  M  - f  1  components  in  (30),  but  doing  so  at  this  point 
in  the  derivation  needlessly  obscures  the  discussion. 

A  physical  interpretation  of  component  tt toGo(r\Xt )  is  that  7rto  represents  the  fraction 
of  the  total  power  due  to  the  background  and  Gq{t \Xt)  models  the  variation  from  cell  to 
cell  of  the  background  level.  The  remaining  components  vtlGi(r\Xt),  ...,irtMGM(r\Xt)  are 
interpreted  as  target  models  in  which  wtk  is  the  fraction  of  total  power  due  to  target  k  and 
Gk{r\Xt)  models  the  cell-level  variations  of  target  k.  The  parametric  form  (30)  assumes  that 
a  target’s  power  level  may  be  spread  across  more  than  one  cell  of  the  sensor  display.  The 
parametric  form  of  the  spread  functions  Gk(r\Xt )  need  not  be  specified  until  later. 
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A  missing  variable  k^T  is  used  to  specify  which  component  of  the  mixture  generated  the 
missing  variable  Q t£r,  so  that  0  <  ktir  <  M.  It  is  assumed  that  kUr  is  a  random  variable  with 
discrete  PDF  specified  by  {7rt0,7rtl)  ...,7rtM}-  Hence,  if 

Kti  =  {kti i)  •••)  ktznu}  for  i  =  1, S  and  t  —  1, T, 

then  all  variables  in  Ku  are  IID.  Let  Kt  =  {Kti,  •■■■,  Kts}  and  K  —  {Ki, K?}-  Extending 
the  density  (27)  to  include  K  gives,  finally,  the  complete  data  PDF  at  the  end  of  the  third 
stage  as 

t  s  nte 

Pg>m(x,Af,^,c,^)=fe(x)n  » n  n  <3i> 

t= 1  e=l  r=l 


where 


fk(r\Xt)  =  irtk  Gk(r\Xt )  (32) 

is  used  in  (31).  The  dependence  of  fk{r\Xt)  on  the  mixing  proportion  7rtk  is  implicit  in 
the  abbreviated  notation  (32).  Summing  (31)  over  all  K  gives  (27)  after  substitution  of 
the  mixture  (30),  so  the  additional  random  variables  K  are  compatible  with  the  existing 
structure. 
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4.  E-STEP 


4.1  CONDITIONAL  EXPECTATION  OVER  ASSIGNMENTS  K 


In  the  E-step  of  the  EM  method,  the  so-called  auxiliary  function  Qn  is  evaluated  as  a 
conditional  expectation  of  the  logarithm  of  the  complete  data  density  (31).  The  required 
expectation  is  with  respect  to  the  missing  data  {Nc,  (,  K},  and  it  is  conditioned  on  N  and 
a  current  value  of  X,  denoted  by  X'.  Explicitly, 

Q*  =  Em,(k  [{logj>«,pr, N, N*. c, K)}  \N,X'},  (33) 

where  denotes  the  expectation  with  respect  to  the  missing  data.  The  conditional 

density  of  the  missing  data  is  obtained  by  using  Bayes  Theorem,  the  complete  data  density 
(31),  and  the  incomplete  data  density  (10).  After  algebraic  manipulation,  the  result  is 


p(NcX,K\N,X')  = 


pi3l(X',iV,IVc,C,^) 


Pinc(N,X> ) 


s  nti 


fkuAQAXl) 


n 

fc 1  fcl  r=l  tK  t/ 


where  p(N£\Nt,  X')  is  the  negative  multinomial  density  (13).  Substituting  (29)  and  (31)  into 
(33),  and  dropping  the  terms  because  they  do  not  depend  on  X,  gives 

T 

Qn  —  Enc  log.Ps0 (Xp)  +  £  N^tal  logpEt|Ht_i(At|At_i)  N,X' 

t= i 

'  T  S  nte 

+  Enc^k  E  £  X lo«  fc  K uAXt)\N,X'\,  (35) 


fcl  fcl  r= 1 


where 


NT1  =  Ntx  +  7VtcE,  t=l, ...,  T. 


The  second  expectation  in  (35)  is  evaluated  as  three  nested,  conditional  expectations: 

«»  =  £„<[£([£*[{•}  |  AT,*']  ]  ]•  (36) 

The  outermost  expectation  in  (36)  and  the  first  expectation  in  (35)  are  both  with  respect  to 

Nc. 

The  innermost  expectation  in  (36)  is,  by  definition, 

t  s  nt( 

Ek  =  Ek  EEE  i°gfc„(c«.|x;)  w,*' 

_t=l  fcl  r=l 

{T  S  ntt  } 

EEE  l°gA„r(C<fr|X.)  [  p(NcX,K\N,X'). 

fc 1  fcl  r= 1  J 
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Interchanging  the  sum  over  K  with  the  triple  sum  over  {t,£,  r},  substituting  (34),  and  then 
pushing  the  sum  over  K\kter  further  inside  the  summand  gives 

T  S  nti 

E«  =  EEEE  log/fewr(Ctfr|*t)  p(Nc,  C,  K\N,  X') 

t=l  e=l  r=  1  K 


T  S  ntl  (  M 


=  EEE1  E  p  (X1)  fktir(Cter\Xt)  log  fktlr (Qer\Xt) 

t=  1  £=1  r=  1  lkUr= 0 


X  ^p(JVc,  C,K\N,X') 

K\kttr 


It  is  straightforward  to  see  from  (34)  that 


e  p(nc>  c,  k \n,  x') = w.  v!)  n  ii 

t=l  1=1  r=l 


The  absence  of  a  subscript  on  the  sample  density  (38)  is  a  direct  consequence 

of  marginalizing  over  K.  This  completes  the  conditional  expectation  with  respect  to  K. 

4.2  CONDITIONAL  EXPECTATION  OVER  LOCATIONS  C 

The  conditional  expectation  of  EK  with  respect  to  the  collection  of  all  sample  locations 
C  is  defined  by  the  multiple  integral 


-/"•/ 


Ek  dC, 


where  there  are  as  many  integrals  in  (39)  as  there  are  indices  in  K.  For  all  indices  {t,£,r}, 
the  domain  of  integration  of  QUr  is  Be(t).  Substituting  (37),  pushing  the  multiple  integral 
inside  the  triple  sum  over  {t,£,  r},  and  then  arranging  the  integrals  to  match  corresponding 
sums  over  K  gives 


T  S  ntl  (  M 

E<  =  EEE  E 

t=  1  £=1  r= 1  U«r= o 


PtW) 

f kt£r  (Cttr\Xt)  log  fkUr  (Citr  |^t)  d^tir 


[■■■  [  E  p(NC’  c,  m, x')  d( c^).  > 

c\ d 
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From  (38)  and  the  definition  (30),  it  is  seen  that 


Expression  (40)  is  simplified  by  performing  the  following  steps:  (1)  substitute  (41)  into  (40); 
(2)  change  kt(T  and  Qtr  to  the  dummy  variables  k  and  r,  respectively;  (3)  make  the  sum  on 
k  the  outermost  sum  instead  of  the  innermost;  (4)  collect  terms  to  eliminate  the  exception 
in  the  product  and  recover  the  negative  multinomial  density;  and  (5)  recognize  that  the 
summand  no  longer  depends  on  r  and  so  is  equivalent  to  a  multiplication.  The  result  is 


M 


Er 


£££ 

fc= o  t= i  e=i 


nu 


n  *w;>  I® 


t= 1 


This  completes  the  conditional  expectation  with  respect  to  £. 


4.3  CONDITIONAL  EXPECTATION  OVER  COUNTS  Nc 

The  auxiliary  function  Qh  is,  from  (35),  the  sum  of  two  terms,  namely, 

Qn  =  ENc  [logps(V)|V  X']  +  ENc  [Ec]  ,  (43) 

where  the  data-dependent  a  priori  density  ps{X)  is  given  by  (29)  and  is  given  by  (42). 
The  first  term  in  (43)  is  a  conditional  expectation  with  respect  to  Nc  =  {iVf, ...,  N%},  where 
N£  is  given  by  (11);  however,  the  second  term  is  an  unconditional  sum  over  all  Nc  because 
the  required  conditioning  is  implicit  in  the  definition  of  E$. 


Both  terms  in  (43)  require  evaluating  the  quantity  nte,  where 

T 

nu  =  £w.[n«|JV,X']  =  ]T  ntl  []  p(Nt\N-„ X'J . 


(44) 


Nc 


t=  1 


The  expectation  in  (44)  simplifies  in  one  of  two  ways.  If  1  <  i  <  L(t),  then  nte  =  nt£  because 
nti  is  not  involved  in  the  sum  over  Nc,  and  because 


£  rbw  1% xi)  =  I  £pWI«i,Xi)  )  •  •  •  |  £p(N}INt,  X't)  ]  ,  (45) 

N'  i=l  \  Nf 


evaluates  to  1,  using  the  PDF  identity  (14).  If,  however,  L(t)  +  1  <  l  <  S,  then 


n,r  =  £  nap(N;\NuX't) 


NC\N£  t=  1 
t^t 


(46) 
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The  term  in  brackets  in  (46)  equals  1,  as  is  seen  by  factoring  the  sum  as  in  (45);  therefore 
from  (15),  ' 


_  I  %  for  1  <  l  <  L(t) 

=  1  for  L(t)  +  1  <1<S.  <47> 

Truncated  cells  are  seen  from  (47)  to  contribute  to  Q%  in  proportion  to  the  expected  number 
of  measurements  in  those  cells;  thus,  the  negative  multinomial  PDF  is  a  kind  of  extrapolation 
procedure  to  compensate  for  truncated  cells. 


By  linearity  of  the  expectation  operator, 

=  £V[Arffi  +  A?£|A',,A';] 

=  Nts  +J2  Nk  n  p(Nm,Xt) 


Nc 


i=l 


—  NtTt  +  ntl ■ 


Substituting  (47)  and  using  the  identity 


P(X[)  +  J2  P‘(Xl )  =  EP'W)  =  [  f{r\X't)dr  =  1 

i=L{t)+ 1  1=1  v/i?dlm(c) 


gives  the  result 


E[Nl$“\NuX[}=-^- 


(48) 


The  results  (47)  and  (48)  enable  the  computation  of  (43). 

The  first  term  of  (43)  is  evaluated  from  (35)  using  linearity  of  the  conditional  expectation 
operator  and  the  limit  (48)  to  obtain 


£«*p0gfeM|JV<,  XQ  =  logpzJAoi  +  £  logPH,|E,_,  (A'.iAVi) 


(49) 


The  second  term  of  (43)  is  evaluated  from  (42)  using  linearity  of  the  expectation  operator 
and  substituting  (47)  to  obtain 


MTS  _ 

=  P777T  /  fk(r\Xl)  log fk(T\X,)dT.  (50) 

fc=0  t= 1  1=1  e^t)  J B((t) 

The  auxiliary  function  Qh  is  the  sum  of  (49)  and  (50). 
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4.4  LIMIT  OF  Qn  AS  QUANTIZATION  SCALE  h2  -»•  0 

It  is  now  shown  that  the  synthetic  histogram  is  eliminated  by  taking  the  limit 

Ql  =  lim  h2  Qn- 

7i2->0 


From  the  definition  (3)  of  nte,  it  follows  that 


lim  h2  rite  =  lim  h2  -4?  I  =  zte- 
*  ft2-+o  ft  1 


From  (3)  and  (4)  it  follows  immediately  that 


lim  ft  Ntx  =  lim  <  ft 

h2-+0  h2->0 


{"W 

h2  T,  S 

=  W  lim  {a2  ^ 

l  La  J 


■  IIZ.il ,  (53) 

where  ||-||  denotes  the  so-called  Li-norm  defined  by 

L(t) 

Ztt‘  (^4) 

i=i 

If  zti  is  magnitude-squared  data,  then  \\Zt\\  is  a  sum  of  squares  (i.e.,  it  is  proportional  to 
power).  Using  (53),  it  is  reasonable  to  define  the  expected  measurement  zte  as 

zte  =  lim  ft2  nu 

h2~&0 

[  Ztt  for  1  <  t  <  Lit) 

~  {  IIZ.II  %§  for  L(t )  +  1  <(<S.  (55) 

The  limit  (51)  is  evaluated  term-by-term  using  expressions  (52)  and  (55).  The  quantity 
h2  log p-=0  ( A0)  goes  to  0  in  the  limit,  and  the  final  result  is 


=  £  -Mk  log  Pit  I  =t- 1  (Xt  I  At-  !  ) 


MTS 


+  EEE 


k= 0  t=  1  t-- 


r  wo  JbM 


Mt\XD  log/*(r|X,)  dr. 


This  completes  the  evaluation  of  the  limiting  form  of  the  auxiliary  function.  The  limiting 
form  Q*  uses  measured  sensor  data,  not  the  synthetic  histogram  data. 
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5.  AUXILIARY  FUNCTION  FOR  HISTOGRAM-PMHT 

Target-specific  subsets  of  the  multi-target  state  and  parameter  vectors  are  now  identified. 
The  multi-target  state  vector  Et  is  partitioned  as 

■— 't  =  'tOj  ■— ‘ilj  “ 'tM~\  i  t  0,1,  •••> 

where  Etk  is  the  state  of  target  k  at  time  t.  The  corresponding  target  parameters  are  given 
by 


Xf  —  Xd, ...,  xtM}  j  t  —  0,1, ...,  T , 

where  xto  is  the  parameter  of  the  background  density  at  time  t,  and  xtk  is  the  parameter  of 
target  k  at  time  t.  There  is  no  need  to  specify  a  parametric  form  for  ps0  (X0)  because  it  is 
absent  from  QK  Assuming  that  targets  and  background  are  independent  at  all  times, 

M 

PEt\E.t-i{Xt\Xt-l)  —  |"J  Pstk\3t-ltk(xtk\xt-l,k)  ■  (57) 

k=0 

The  abbreviated  notation  in  (32)  rewritten  in  single  target  style  becomes 

fk(r\Xt)=TtkGk(r\xtk).  (58) 

Mixing  proportions  and  target  parameters  are  to  be  estimated. 


Let  x'tk  and  ir'tk  denote  current  values  of  the  target  parameters  and  mixing  proportions, 
respectively.  Substituting  (57)  and  (58)  into  (56)  gives 


where 


and 


M 


Q'=  IN“  +  X(3**' 


(59) 


i=l 


k= 0 


M 


QtTT  —  Yh 


k= 0  U= 


i^LGk{TW,k)iT. 


Kk  *°S7rtfc 


(60) 


T  ^  ||  zA\ 

Qkx  =  Y^TF' 7  l°g  Pst  |Ht-  i  (^tfc  I  xt- i  ,fc ) 

t=l  F(xt) 

+  YY  pk(Y'\  [  Gk(T\xtk)  logGk(r\xtk)  dr,  (61) 

t=l  i=i  ^(  J Bi(t) 

where  \\Zt\\  and  zti  are  given  by  (54)  and  (55),  respectively. 
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The  auxiliary  function  (59)  holds  for  general  parametric  forms  of  the  Markovian  densities 
PEt\2t-i(xtk\xt-i,k)-  It  also  holds  for  general  parametric  forms  of  both  the  target  densities 
Gfc(r \xtk),  k  >  1,  and  the  background  density  Go(r\xtk). 

Different  parameterizations  will  typically  be  used  in  most  applications  to  model  back¬ 
ground  structure  and  target  behavior.  For  example,  if  the  background  model  does  not  use  a 
state  process  model,  and  it  is  stationary,  so  that  Xto  =  for  all  t,  then  the  contribution  in 
(61)  for  the  background  term  k  =  0  becomes  simply 

t  s  ,  r 

Qox  =  5^2  pW( yi  /  G0(r\x'0)  log G0(t\x0)  dr.  (62) 

t=  1  1=1  J  Bt(t) 

Estimates  of  the  stationary  background  parameter  Xq  and  the  target  parameters  would  then 
be  derived  from  (62)  and  Qkx,  1  ^  k  ^  M.  Alternatively,  if  the  background  is  not  estimated 
because  it  is  already  normalized  in  some  application-specific  manner,  then  the  term  Qox  is 
omitted  from  Q t  altogether. 
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6.  M-STEP 


6.1  MIXING  PROPORTIONS:  GENERAL  CASE 

An  expression  for  updated  mixing  proportions  is  derived  from  the  general  expression 
(60).  There  is  no  need  to  specify  particular  parametric  forms  for  the  target  and  background 
densities  Gk(r\x'tk).  The  update,  denoted  by  ntk,  is  derived  by  maximizing  (60)  subject  to 
the  natural  constraint  on  the  sum  of  the  7rtfc’s.  The  appropriate  Lagrangian  function  is 


M 

E 

k= 0 


plT7)  f  Gk^x'tk)  dr 


M 


k)g  TTtfc  +  A t  1  ~ 


T^tk 


k=Q 


where  At  is  the  Lagrange  multiplier.  Differentiating  with  respect  to  7rtk  and  solving  for  Trtk 
gives  the  update 


nk  =  ^  pjb)  [  Gk^x'^  dr 


(63) 


Summing  (63)  over  all  k  and  invoking  the  equality  constraint  gives  the  Lagrange  multiplier 


as 


M 


~  ^2 1* tk 


k=0 


\H  p%J)  [  Gk{r\x'tk)dr 

[7^  Fi\xt) 


(64) 


Substituting  (64)  into  (63)  gives  the  updated  estimate  tt tk. 


Additional  linear  constraints  are  easily  imposed  on  the  method.  For  example,  it  may  be 
desirable  to  require  the  mixing  proportions  to  be  stationary;  that  is,  7ctk  =  nk  for  all  t.  The 
resulting  estimator  requires  summing  over  l  and  t  and  adjusting  the  normalization  factor 
accordingly.  Such  constraints  are  application-dependent  and  are  not  pursued  here. 


6.2  STATE  ESTIMATES  FOR  LINEAR  GAUSS-MARKOV  CASE 

Parameterizations  appropriate  for  the  special  case  of  linear  Gauss-Markov  target  pro¬ 
cesses,  linear  measurement  models,  and  known  background  are  assumed  in  this  section.  The 
measurement  and  target  covariance  matrices  {Rtk}  and  {Qtk}  are  assumed  known  through¬ 
out  this  section.  The  case  of  unknown  covariance  matrices  is  treated  in  the  following  section. 

For  k  =  1, ...,  M  and  t  =  1,  ...,T,  the  target  process  models  are 

fatkl-Et— i,fc)  =  N(xtk,  Ft— itkXf—itk,  Qt-i,k),  (6®) 

and  the  measurement  models  are 

Gk(r\xtk)  =  N(t-,  Htkxtk ,  Rtk),  (66) 
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where  N(r;  /j,,  E)  denotes  the  multivariate  Gaussian  PDF  with  mean  vector  /j,  and  covariance 
matrix  E.  The  background  density  is  assumed  to  be  completely  known  for  all  t;  that  is, 

Go(r\xto)  =  G0(T]xt0),  (67) 

where  xt0  are  known  constants.  Consequently,  Qox  is  omitted  from  QK 


For  k  >  1,  substituting  these  forms  into  Qkx  and  dropping  determinant  terms,  since 
they  do  not  depend  on  { xtk},  shows  that  —2Qkx  is  a  discrete-continuous  sum  of  weighted, 
squared  errors.  Explicitly,  the  total  squared  error  is  given  by 


2  Qkx  —  'y  ^  ~  Qt-ijk 

+  EE  JK  [  N(r;  Ra) 

t=  i  e=i  J B((t) 


x  (t  -  Htkxtky  Rt£  (t  -  Htkxtk)  dr , 
where  asterisks  denote  vector  and  matrix  transpose.  Let 


(68) 


X (k')  -DA:)  •••>  % Tk }  • 

Using  the  general  gradient  identity 

V*  (Fx  -  n)*  E"1  (Fx  -  ju)  =  2  F*  E_1  (Fx  -  n) 

to  take  the  gradient  of  the  total  squared  error  with  respect  to  each  of  the  vectors  in  X ( k ) 
gives  the  necessary  conditions  for  minimizing  -2 QkX.  It  is  assumed  that  this  linear  system 
of  equations  is  full  rank,  so  that  the  necessary  conditions  are  sufficient  also  in  this  case. 


The  linear  system  takes  the  form  r^J^sT (fc)  =  bk,  where  r*  is  a  symmetric  and  block 
tridiagonal  matrix  of  size  (T  +  1)  x  (T  +  1)  blocks  and  the  right-hand  side  is  a  compatibly 
partitioned  vector.  After  algebraic  manipulation,  the  system  matrix  is  given  explicitly  as 

Dok.  —Bok  0  •••  0  0  0 

~Bok  Aik  -I-  Dik  —Bik  0  0  0 

’  *  *  *  •  •  C  5 

^  0  0  ~Bt-2,h  A T-i,k  +  Dr-i,k  —Br-i,k 

0  0  0  0  -B*T_1>k  ATk 

where  the  block  matrices  Atk,  Dtk,  and  Btk  are  given  by 

Ak  =  +  Hlk  Htk  for  1  <  t  <  T, 

Dtk  =  Ftl  (4*)  Ftk  for  0  <  t  <  T  —  1,  (69) 

Btk  =  Ftk  (Qtfc)  for  0  <  t  <  T  —  1, 
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(70) 


and  the  synthetic  target  and  measurement  covariances  used  in  (69)  are 

P(Xl+i) 


Qtk  — 


Qtk  for  0  <  t  <  T  -  1, 


and 


Rtk  = 


Rtk 

<k  ytk 


for  1  <  t  <  T, 


(71) 


where 


^  =  E  pTW  f  n{-t '  B***'  **)  dr'  <72) 

The  right-hand  side  of  the  system  is  a  vector  that,  partitioned  into  T  +  1  blocks,  is  given  by 


m 

0 

-1 

bk  = 

rr* 

"1  k 

[Rik) 

/  ~  \ 

Zlk 

-1 

_Hk  1 

I  zTk 

where  the  synthetic  measurements  used  in  (73)  are  given  by 


(73) 


Ztk 


1  Zti  f 

~Wk  jrt  PtW)  Jbm 


t  N(t ;  Htkx'tk.  Rtk)  dr. 


(74) 


The  solution  of  the  linear  system  TkX(k)  =  bk  gives  updated  state  estimates,  denoted  by 
X (k)  =  {T0fc)  %ik,  xTk}.  It  can  be  solved  using  an  appropriate  form  of  Gaussian  elimination 
for  symmetric,  block  tridiagonal  matrices. 


Alternatively,  expressions  (69)  and  (73)  are  in  precisely  the  form  needed  to  show  that  a 
nonstationary  Kalman  smoothing  filter  with  a  diffuse  prior  PDF  for  the  state  at  time  t  =  1 
can  also  compute  the  same  state  estimates.  This  numerical  equivalence  is  seen  from  the 
expression 


exp(<2*x)  oc  JJiVOctfc;  i't-i.fcZt-i,*,  Qt-i,k)  N(ztk\  Htkxtk,  Rtk),  (75) 

t=i 

a  result  obtained  from  Qkx  by  algebraically  completing  the  squares  on  the  variables  in  A  (A:). 
Maximizing  (75)  over  X (k)  =  {x0fc,  iu, ...,  xTk}  is  equivalent  to  minimizing  the  total  squared 
error  (68)  over  the  same  quantities,  i.e.,  equivalent  to  solving  the  system  TkX(k)  =  bk.  An 
important  feature  of  the  equivalent  smoothing  filter  is  that  it  uses  synthetic  measurements 
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(74)  and  synthetic  covariance  matrices  (70)- (71),  not  the  given  measurements  {zte}  and 
originally  specified  covariances  {Qtk}  and  {Rtk}- 

The  synthetic  measurements  Ztk  form  a  probabilistic  centroid  over  the  sensor  cells  C.  To 
see  this,  let 


R-tk%tk’  Rtk)  dr 

fs((t)  ^(T'>  R-tk%tki  Rtk)  d,T 


(76) 


denote  the  cell-level  centroids  for  target  k  at  time  t.  Because  ztkl  is  the  mean  of  a  PDF 
whose  support  is  confined  to  the  cell  it  follows  that  ztkl  €  B£(t)  if  Be(t)  is  a  convex 

set.  It  is  easily  verified  that 


Eli 

pe(X't)  fBt(t)N{r,  Htkxtk,  Rtk)  dr 

^tkl 

Eli 

i  i 

of 

Saf 

5a; 

HO 

1 _ 1 

(77) 


is  equivalent  to  the  definition  (74).  The  synthetic  measurement  ztk  is  a  convex  combination 
of  the  cell-level  centroids  {ztk i,  —,ztks}  in  (77). 


The  cell-level  centroids  (76)  may  be  a  convenient  point  in  the  calculation  to  deal  with  some 
of  the  numerical  dynamic  range  issues  that  may  arise.  For  example,  in  the  one-dimensional 
case  when  cells  B£(t)  are  intervals,  it  is  straightforward  to  prove  the  identity 


fa  r  W(r;  IB  ^2)  dr 
fa  N(r>  ^  °2)  dr 


=  H  —  a 


V  ir  ) 


(78) 


The  denominator  is  a  difference  of  error  functions,  so  evaluating  the  cell-level  centroids 
in  the  one-dimensional  case  seems  both  fast  and  accurate.  However,  when  the  mean  fj,  is 
separated  from  the  interval  [a,  b]  by  many  multiples  of  cr,  numerical  underflow  may  cause 
both  numerator  and  denominator  of  the  ratio  on  the  right  hand  side  of  (78)  to  become  zero. 
An  alternative  approach  is  to  show  by  a  change  of  variables  that 

_  c  |  f^A  T_  iV(r  + c;  ib  cr2) 

f-A  N(r  +  c'>  <?2)  dr 


where  [a,  6]  =  [c  -  A,  c  +  A],  Expanding  numerator  and  denominator  in  a  Maclaurin  series, 
substituting  the  identity 


d?_ 

drn 


N(t;  it,  a2) 


(-1)" 

Gn 


N(r ;  n,  a2)  Hen 


n  —  0,1, 2, 
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where  Hen  (•)  is  the  Hermite  polynomial  of  degree  n,  and  canceling  the  common  factor  of 
N(r ;  yu.,  £T2)  gives  a  standardized  (dimensionless)  ratio  in  the  form 

7  —  c  /A\2  /£”o,2„+3 )U+1).  (f)2n  £f^l  (?)) 

^  V17/  (  £“=0  (2n+l)(2n)!  (tU  #e2"  (“)  / 

Rational  approximations  of  desired  accuracy  are  obtained  by  truncating  in  the  series;  how¬ 
ever,  asymptotic  approximations  are  preferable  for  sufficiently  large  values  of  Approx¬ 
imations  derived  by  methods  such  as  these  may  be  especially  useful  in  higher  dimensional 
applications. 

6.3  COVARIANCE  ESTIMATES 

Measurement  and  target  covariance  matrices  are  assumed  unknown  in  this  section,  and 
an  estimation  algorithm  is  derived  by  the  generalized  EM  (GEM)  method.  The  E-step  of 
the  GEM  method  is  the  same  as  the  E-step  of  the  EM  method.  Consequently,  the  mixing 
proportions  {tt^}  are  estimated  in  exactly  the  same  way  as  discussed  earlier. 

The  terms  in  the  Qt  function  not  involving  mixing  proportions  constitute  a  function  of 
the  form  Q*(X,  {Qtk},  {Rtk})-  The  M-step  of  the  GEM  method  requires  solving  the  problem 

max  QS(X,  {Qtk},{Rtk})-  (79) 

X,Q,R 

Unfortunately,  the  necessary  equations  obtained  by  differentiation  with  respect  to  the  state 
and  covariance  variables  are  coupled,  so  the  EM  method  is  difficult  to  use.  By  replacing  the 
maximization  (79)  with  two  nested  maximizations 

max  {max  Q*{X,  { Qtk },  {#**}))  ,  (80) 

it  is  readily  verified  that  the  Q  function  is  necessarily  increased,  even  though  it  is  not 
maximized.  Any  increase  is  sufficient  to  ensure  that  the  convergence  properties  of  the  EM 
method  apply  here  as  well  [3,  8].  The  covariance  matrix  estimators  obtained  in  this  way 
using  (80)  are  therefore  GEM  estimators  rather  than  EM  estimators. 

Let  {xofc, •••,  %Tk}  denote  the  updated  state  estimates  obtained  as  in  the  previous 
section  for  current  estimates  of  mixing  proportions,  states,  and  covariances.  The  notation 
of  the  total  squared  error  expression  (68)  must  be  adjusted  slightly  for  the  present  context; 
that  is,  N {t\  Htkx'tk,  Rtk )  must  be  replaced  by  N(r;  Htkx'tk,R'tk),  as  is  easily  seen  from  the 
derivation  of  Qkx,  and  xtk  must  be  replaced  by  xtk  because  of  the  nested  maximization 
(80).  Taking  the  gradient  of  the  adjusted  total  error  expression  with  respect  to  Rtk  using 
the  general  matrix  gradient  identity 

log  N(x]  ji,  E)  =  -IT1  +  E-1(:r  —  /i)( x  —  n)* E-1 
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and  solving  for  Rtk  gives  the  estimator 


E?=i 

Pi{x't)  S B{(t)  -^(r>  Htkxtk’  Rtk)  dT 

Rtki 

El 

=1 

Pe(X't)  fj B((t)  -^(T>  Htkxtk’  Rtk)  dT 

where  the  cell-level  measurement  covariance  matrix  contributions  are  defined  by 

N(t,  Htkxtk,  Rk)  ( t  —  Htkxtk)  (r  —  Htkxtk )  d,T 

“  Jbm  N(t’  H^'tM  dr  ' 


The  estimator  (81)  cannot  be  full  rank  unless  S  >  dim(C).  Also,  numerical  considerations 
analogous  to  those  mentioned  at  the  end  of  the  previous  subsection  apply  to  computing  Rtki- 
Further  details  are  not  pursued  here. 


If  all  the  measurement  covariances  are  required  to  be  the  same,  so  that  Rtk  =  R  for  all 
t,  then  the  same  procedure  gives 


p((X't )  /s^t)  N(r,  Htkxtk.  Rtk)  dr 

Rtki 

Et=i  ^ tk  Ei=i 

pt(x’t)  fsrft)  N(r,  Htkxtk,  Rk)  dr 

(82) 


The  estimator  (82)  cannot  be  full  rank  unless  ST  >  dim(C).  Both  estimators  (81)  and  (82) 
are  convex  (discrete-continuous)  combinations  of  outer  products  of  innovations. 


A  similar  method  leads  to  target  covariance  matrix  estimates.  Taking  the  gradient  of  the 
total  squared  error  (68)  with  respect  to  Qt-i,k  and  solving  for  Qt-\,k  gives 

Qt-l,k  =  {xtk  ~  Rt-l,kxt-l,k)  (xtk  ~  Rt-l,kxt—l,k)  •  (S3) 

The  estimator  (83)  is  full  rank  only  if  dim(x^)  =  1.  When  dim(xtfc)  >  1,  taking  the  gradient 
of  the  total  squared  error  under  the  additional  constraints  Qtk  =  Qk  for  all  t  gives 

Cr  Et=l  P(xf)  (Xtk  —  Rt-l,kxt-l,k)  {xtk  ~  Ft-l,kxt-l,k) 

Qk  —f  p^j  •  (84) 

2^t=\  ¥{Xd 

The  estimator  (84)  cannot  be  full  rank  unless  T  >  dim^**). 

Additional  constraints  can  be  imposed  on  the  problem  if  desired.  For  example,  if  the  k- th 
target  and  measurement  processes  are  stationary,  then  Rtk  =  R  and  Qtk  =  Q  for  all  t.  Linear 
constraints  such  as  this  lead  to  estimators  that  average  over  both  t  and  t,  so  the  resulting 
covariance  estimators  are  more  likely  to  have  full  rank.  Alternatively,  Wishart  matrix  prior 
PDFs  can  be  utilized  to  guarantee  that  the  estimated  covariance  matrices  have  full  rank  [10]. 
Such  variations  are  application  dependent  and  are  not  pursued  here. 
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7.  STATEMENT  OF  THE  HISTOGRAM-PMHT  ALGORITHM 


Many  application-specific  variants  of  the  histogram-PMHT  algorithm  are  easily  derived 
from  the  above  analysis.  The  assumptions  of  the  specific  algorithm  given  in  this  section 
are  now  reiterated.  The  sensor  cells  C  are  known  and  of  fixed  size  and  shape.  The 
numbers  (L(l),  ...,L(T)}  of  displayed  cells  are  given,  and  the  displayed  cell  list  B(t)  = 
{Bi(t),...,BL(T)(t)}  is  known  for  t  =  1,...,T.  The  measurement  vectors  {Zi,...,ZT}  are  in 
the  form  specified  by  equation  (1). 

The  target  covariance  matrices  are  stationary;  that  is,  only  one  covariance  matrix  is  esti¬ 
mated  per  target.  The  measurement  covariance  matrices  are  non-stationary,  so  a  covariance 
matrix  is  estimated  for  each  target  at  each  time.  The  T  background  (noise)  PDFs 

{(?o(r;^io),  ...,  Go(t]Xto)} 

and  their  corresponding  parameters  {zio,  £20,  — ,  Xto}  are  given.  For  computational  efficiency, 
the  numerical  values  of  the  quantities  {||Zij| , ...,  ||^r||}  and  the  integrals 


are  precomputed. 


xto)  dr, 


£  —  1,  ...,S  and  t  =  1,...,T, 


All  target  models  are  assumed  to  be  observable;  that  is,  a  unique  point  estimate  exists  for 
each  target  state  given  the  measured  data.  This  assumption  may  impose  mild  restrictions  in 
some  applications.  In  some  kinematic  applications,  for  example,  measurements  of  position 
are  used  to  estimate  a  target  state  vector  comprising  both  position  and  velocity.  In  this 
case,  it  is  clearly  necessary  to  have  at  least  two  scans  in  the  batch,  unless  an  informative  a 
priori  state  density  is  available.  For  the  single  batch  histogram-PMHT  algorithm,  a  priori 
information  is  absent,  so  in  this  case  it  is  required  that  T  >  2.  In  contrast,  a  single  scan  can 
be  used  in  the  recursive  histogram-PMHT  batch  algorithm  because  it  uses  an  informative 
a  priori  density  obtained  via  earlier  measured  data;  thus,  the  single  batch  algorithm  is 
essentially  an  initialization  procedure  for  the  recursive  batch  algorithm.  General  conditions 
for  observability  of  the  target  state  are  not  pursued  further  here. 


7.1  SINGLE  BATCH  HISTOGRAM-PMHT  ALGORITHM 

7.1.1  Initialization 

Initialize  mixing  proportions  {?<?}  so  that  Tr'ff.1  >  0  and 

if +  if,',0’  +  ...  +  ifffi  =  1  for  t  =  1, ....  T. 

For  target  models  k  =  1, M,  initialize 

r^.(0)  ^(0)  ~(o)  *, 

target  state  sequence:  \xokixik>  •••>  xTkh 

measurement  covariance  sequence:  {R^,  R^,  •••, 
target  covariance:  . 
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The  initial  covariance  matrices  are  assumed  to  be  valid  covariance  matrices;  that  is,  they 
must  be  symmetric  and  positive-definite. 


7.1.2  Iteration 


Let  i  denote  the  histogram-PMHT  iteration  index.  For  i  =  0, 1, 2, ...,  compute  the  fol¬ 
lowing  13  quantities: 

1.  Target  cell  probabilities  for  t  =  1, ...,  T;  t  -  1, S\  and  k  =  0, 1, ...,  M  : 

p(i+i)  _  (  fst(t)  Go(t;  Xto )  dr,  k  =  0, 

m  ~  1  L,(„  N  (’■;  Htt  -R®)  dr,  k  =  1, M. 

2.  Total  cell  probabilities  for  t  =  1, ...,  T  and  £  =  1, ...,  S  : 


p(i+ 1)  _  V  -Wp(i+1) 
k~0 

3.  Total  sensor  probabilities  for  t  =  1, ...,  T  : 

L(t) 

p(i+l)  _  ^  '  p(i+ 1) 
t=l 

4.  Expected  measurements  for  t  =  1, ...,  T  and  i  =  1, ...,  5  : 


_(i+i)  _  I 

~  1  II  7 


H  Pt?+1)  M(m) 


1  <  ^  <  T(t) 

L(t)  +  1  <  l  <  S. 


1,. 

..,  5;  and  k  = 

=  1,.. 

.,M 

N 

(t; 

m 

1  dr. 

*tkt 

6.  Synthetic  measurements  for  t  =  1, T  and  k  =  M  : 


5<*+i)  _ 

— 


5  |-(z+l)  /  p(i+l)  /  p(*+l)\l  *r(i+l) 


Eti 


1  tl 


Ef=i  ^ 


r(i+l)  /  p(i+ 1)  I  n(i+l) 
a  \  rtu  /  -r« 


7.  Synthetic  measurement  covariance  matrices  for  t  =  l,  ...,T  and  k  =  1, ..., M  : 


>(i+i)  _ 


V'5  -(*+!)  f  p(j+l)  /  p(i+l)  A 

/l tfc  Z^=i  \rtu  j  rti  j 
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8.  Synthetic  target  covariance  matrices  for  t  =  0, 1, T  —  1  and  k  =  17 M  : 

^ tk  II  *7  II  ' 


PXi’  m 
Pmll  Qk  ■ 


9.  Estimated  mixing  proportions  for  t  =  1, T  and  k  =  0, 1, M  : 


7^+!)  - 

"tk  ~ 


Si(0  ■=(i+1)  f  p(*+!)  /p0+1)'\ 

"tk  2_/&=  1  \jtkl  j  rtl  ) 

£:(0  W'5  p(*+1)  /  p(i+1)  ^ 

Z^fc'^o  "tk'  2^e= i  y*  tk't  j  rti  ) 


10.  Estimated  target  states  for  f  =  0, 1,  ...,T  and  fc  =  1,  AT,  using  (for  computational 
efficiency)  a  recursive  Kalman  smoothing  filter,  which  comprises  a  forward  filter  initialized 
at  time  0  with 


5So+1)m  =  o, 


and  the  dummy  covariance  matrix 


and  given,  for  t  =  0, 1, ...,  T  -  1,  by  the  recursions 

Vw  =  a*  f,|;+1>(*)  +  $i+I). 

<V\k)  =  O*)  n;+lJC{Ht+l,k  O*o  HrWJ,  +  r 


Sift*)  HU U,  +  }"' . 


Cii.m  =  /&?(*)  -  ^wt*)  O*)- 

+  w&t1’ (*>{*<*$  - 

and  a  backward  filter  initialized  at  time  T  with 

4;1)=i§1|(*) 

and  given,  for  f  =  T  —  1, ...,  2, 1,  by  the  recursion 

C1’ =  +  **”’(*)  (^ft*))"1^1! 

and,  for  f  =  0,  by 


jw  a*  4+1> 


k  = 


Ktk£  — 


m)  ( 


(pSiit)w)“1{$si+d  -  sir’w} 

(89) 

(90) 

contributions  for  t  =  1,.. 

,,T;  *  =  1,...,. 

S';  and 

(r  -  4+1>)  (r  -  Ha 

^fe+1))  dr 

P^+l) 

rtkl 
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12.  Estimated  measurement  covariance  matrices  for  t  =  1,  ...,T  and  A;  =  1, AT  : 
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7(i+l) 
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\y  tw  /  ru  J  J  ntu 
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13.  Estimated  target  covariance  matrices  for  k  =  1, M  : 


y^T  foO+1)  _  2?  *(*+!)  ^  (; f(*+ !)  _  p  C-O+OY 


Ql11-1/  _  _ 

Ew  (llZtll/ft<!+1)) 

This  completes  one  step  of  the  histogram-PMHT  algorithm  for  a  single  batch. 


The  only  theoretically  justified  convergence  tests  are  based  on  the  rate  of  increase  in  the 
overall  likelihood  function.  In  practice,  effective  tests  can  be  based  on  the  rate  of  change 
of  the  target  state  estimates  or  on  other  suitable  grounds.  Another  strategy  that  may 
also  be  effective  in  practice  is  simply  to  compute  a  fixed  number  of  iterations  and  terminate. 
However,  useful  alternative  convergence  tests  are  application-dependent  and  are  not  pursued 
here. 


During  algorithm  development,  it  is  highly  recommended  that  the  overall  likelihood  func¬ 
tion  be  calculated  and  its  monotonic  increase  verified.  Monotonicity  is  an  acid  test  for  val¬ 
idating  both  software  implementations  and  numerical  dynamic  range  handling  procedures; 
the  availability  of  such  a  simple  test  is  fortunate  considering  the  detailed  structure  of  the 
histogram-PMHT  algorithm. 


Upon  convergence  at, 
follows,  for  1  <  k  <  M: 


say,  EM  iteration  i* 
Hr{k)  = 

%]T(k)  = 

=  R\ P 

Qtir(k)  =  QP 


,  the  last  estimates  obtained  are  renamed  as 
for  0  <t<T, 

for  1  <  t  <  T,  (91) 

for  1  <  t  <  T, 
for  0  <  t  <  T  -  1. 


The  traditional  notation  of  smoothing  filters  is  adopted  here  to  clarify  the  relationships 
between  the  various  parameter  estimates  in  the  recursive  batch  application. 


This  completes  the  statement  of  the  histogram-PMHT  algorithm  for  a  single  batch  using 
all  the  available  data. 


Error  covariance  matrices  for  the  state  estimates  xt\T(k)  are  not  required  by  the  single 
batch  histogram-PMHT  algorithm,  but  they  are  needed  in  the  recursive  batch  algorithm. 
The  error  covariance  matrix  of  XT\r(k)  is  given  by 

ET\r{k)  =  PprW- 
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For  t  =  T— 1, 1, 0,  the  error  covariance  matrix  of  Xt\r(k)  is  given  by  the  backward  recursion 

X  £t+I|r(fc)  -  P^,(k)]  (p£},t(k))~l  Ftt  P$\k),  (92) 

where  the  quantities  Pt(|f)(A:)  and  P^\t(k)  are  the  intermediate  covariance  matrices  computed 
in  step  10  above,  after  algorithm  convergence. 


7.2  RECURSIVE  BATCH  ALGORITHM 

It  is  assumed  in  this  section  that  data  vectors  Zt  are  available  for  scans  at  times  t  = 
1,2,3, ...,  and  that  scans  are  processed  by  the  histogram-PMHT  algorithm  using  a  sliding 
batch  of  maximum  length  T.  In  the  startup  phase  when  current  time  t  is  such  that  1  <  t  <  T, 
the  histogram-PMHT  algorithm  is  applied  to  all  available  scans.  The  batch  length  thus  grows 
steadily  and  is  always  equal  to  t.  The  batch  is  full  for  the  first  time  when  t  =  T.  Thereafter, 
the  batch  is  refreshed  by  adjoining  the  newest  scan  and  deleting  the  oldest  scan,  thereby 
keeping  batch  length  T  fixed. 

Let  £  >  1  denote  the  time  of  the  most  recent  scan.  During  the  startup  phase,  the 
histogram-PMHT  algorithm  described  in  the  preceding  section  is  used  without  modification 
for  the  current  batch  length.  Using  the  notation  in  (91)  and  (92),  the  histogram-PMHT 
outputs  at  time  t  are  now  denoted  as 

■|a'n|t(^))  Tn|t(A;),  7rn|t(fc),  Rn\t(k),  Qn\t{P) ^  n  0,1, 

This  completes  the  startup  phase. 

When  current  time  t  >  T,  it  is  necessary  to  remove  the  oldest  data  scan  from  the  batch. 
The  current  batch  comprises  data  from  the  most  recent  T  scans, 

{Zt-T+li  Zt-T+2,  •••>  Zt-i,  Zt} . 

The  set  of  all  deleted  data  vectors  is 


{Z1,Z2,  ...,Zt.T}  ■ 

Let  Pst_T{Xt~T  |  Vi, ...,  Nt-r)  denote  the  posterior  PDF  of  Xt-r  conditioned  on  the  quantized 
vectors  Ni, ...,  Nt-r  that  correspond  to  the  deleted  data  vectors  {Zi,  Z2, ...,  Zt-r}  ■  Adjusting 
the  notation  in  the  obvious  way  to  accommodate  the  sliding  batch  gives  the  modified  data- 
dependent  a  priori  density  as 


[pe,_t(X,-t  |  JVi . W,-T)]''*-T'c+"f-T-c  n 


W’n.S+WS.S 


n=t-T+ 1 
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The  only  significant  difference  between  (29)  and  (93)  is  that  the  exponent  of  the  leading  term 
in  (93)  is  changed  because  the  data  at  scan  t  —  T  are  generated  in  the  manner  described  in 
section  3.3.  The  auxiliary  function  Q*(t)  is  found  by  taking  the  limit  as  -»  0  (cf.  Section 
4.4).  The  result  is 


\\Zt-r\\ 

P(X'  T)  (Xt-r\Zi, ...,  Zt- t) 

n=t-T+ 1  '  n' 

M  t  S 

+E  E  E 

k — 0  Ti — 7i — T 1 

The  primary  difference  between  (94)  and  (56)  is  the  additional  term  contributed  at  time 

t  —  T.  The  M  state  estimates  [xt-T\t-T{k)}  are  fixed  and  not  subject  to  further  update,  so 
it  is  assumed  that 


r^T)  Jb  w  fk{r\X'n)  \ogfk(r\Xn)  dr.  (94) 


M 

Pzt-T(Xt-T  |  Nu  ...,  Nt_T )  =  JJ  iV  [xt-T,k ;  Xt-T\t-T(k),  Ei_T|f_T(/c))  .  (95) 

A;=l  ' 

Substituting  the  approximation  (95)  gives  the  modified  form  of  the  total  squared  error  (68) 
of  the  k- th  target.  The  additional  term  at  time  t  —  T  gives  an  additional  term  in  the  upper 
left-hand  corner  block  of  the  matrix  of  the  necessary  equations,  denoted  by  TkX(k)  =  bk  in 
section  6.2.  The  equivalent  Kalman  smoothing  filter  (cf.  equation  (75))  now  has  a  diffuse  a 
priori  density,  as  is  seen  from  the  equivalent  likelihood  expression 


x  ^ Xt-T,k ;  %t—T\t—T (k) ,  - -^-T<t-T\t-T(k ) 

t 

X  IT  (^Xn,k]  Qn-l,fcl  X  (znk\  HnkXnk,  Rnk)  } 

n=t-T+l  '  V  '  J 


Further  details  are  straightforward  and  have  been  omitted. 

For  t  >  T,  after  the  batch  is  refreshed,  the  histogram-PMHT  algorithm  described  in 
section  7.1  is  applied  to  the  data  set 


{Zt-T+l,  Zt-T+2,  •••,  Zt~ i,  Zt} 

to  compute  the  parameter  estimates.  However,  the  initializations  (87)  and  (88)  of  the  Kalman 
smoothing  filter  are  changed  to 


(k)  =  Xt-T\t-T(k ) 


(96) 
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and  the  covariance  matrix 


p(*+ 1) 

±£E-%_T]t_T{k\ 

&t-T 


(97) 


respectively,  where  the  probability  is  defined  as  in  equation  (86)  using  the  current 

algorithm  iterates  X^T]t(k)  and  R^T]t(k).  Also,  the  backward  recursion  (89)  now  runs  from 
t  =  T-ltot  =  0  because  the  special  step  (90)  is  not  required  when  the  a  priori  density  is 
not  diffuse. 


The  outputs  of  the  histogram-PMHT  algorithm,  including  the  error  covariance  matrices 
of  the  state  estimates,  are  denoted  in  the  smoothing  filter  notation  as 


xn\t(k )  for  t  —  T  <n<t, 

En|  t(k)  for  t-T<n<t, 

TTn\t(k)  for  t  —  T  +  1 

Rn\t(k)  for  t  -  T  +  1  <  n  <  t, 

Qn\t(k)  for  t  —  T  <  n  <  t  —  1. 

This  completes  the  statement  of  the  recursive  form  of  the  histogram-PMHT  algorithm. 
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8.  VARIATIONS  ON  THE  THEME 


Independent  target  models  have  been  assumed  throughout  the  report  to  allow  clear  focus 
on  the  central  ideas  and  to  simplify  the  analysis;  however,  this  assumption  can  be  modified 
significantly  without  changing  either  the  fundamental  theoretical  framework  or  the  structure 
of  the  histogram-PMHT  algorithm.  For  example,  the  assumption  of  independent  targets  is 
incorrect  when  the  targets  are  models  of  frequency  lines  and  these  lines  are  known  to  be 
harmonically  related.  It  is  straightforward  in  this  case  to  use  the  functional  dependencies 
between  the  lines  as  linear  equality  constraints  on  the  line  parameters;  that  is,  the  means  and 
covariances  of  harmonically  related  lines  satisfy  specified  linear  constraints  (transformations). 
Linear  equality  constraints  are  readily  incorporated  into  the  M-step  of  the  histogram-PMHT 
algorithm,  as  indicated  in  section  6.  For  further  details  of  this  particular  application,  see 
Luginbuhl  [2,  chapter  5]. 

The  basic  strategy  for  modeling  dependencies  between  targets  can  be  generalized  signifi¬ 
cantly  and  adapted  to  other  applications.  Thus,  known  functional  relationships  arising  from 
physical  or  other  considerations  in  the  application  can  be  modeled  as  parametric  constraints 
on  the  M-step  optimization  problem.  The  difficulty  of  solving  the  M-step  will  depend  heavily 
on  the  mathematical  character  of  the  constraints  and  will,  in  general,  require  the  use  of  nu¬ 
merical  procedures.  The  practicality  of  the  resulting  algorithm  will  be  application-dependent 
in  general. 

A  single  Gaussian  density  may  be  inadequate  in  some  applications  to  capture  the  cell-to- 
cell  variation  of  one  or  more  of  the  targets  on  the  display.  Loosely  speaking,  such  a  target  may 
be  best  described  as  a  “blob.”  Such  problems  may  arise,  for  example,  when  the  underlying 
target  signal  model  is  poorly  understood  and  not  well  matched  to  the  signal  processor  whose 
outputs  feed  the  sensor  display.  For  such  applications,  it  is  probably  worthwhile  to  use  a 
mixture-of-mixtures  density  (see  Streit  and  Luginbuhl  [11])  to  model  the  target’s  cell-to-cell 
variation.  Specifically,  if  the  k-th  target  is  a  blob,  then  the  mixture  model  density  (30)  is  such 
that  component  Gk(r\Xt)  is  itself  modeled  as  a  Gaussian  mixture  whose  parameters  must  be 
estimated.  The  derivation  of  the  histogram-PMHT  algorithm  for  this  case  requires  adding  a 
fourth  stage  of  missing  data,  namely,  the  assignment  of  measured  data  to  components  within 
the  blob  mixture.  While  such  an  algorithm  for  blob-like  targets  is  easily  derived  using  the 
methods  presented  in  this  report,  it  is  left  for  future  work. 
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9.  SUMMARY 


The  histogram-PMHT  algorithm  is  a  multi-target  tracking  algorithm  designed  to  be 
used  with  the  entire  sensor  output  data  stream.  It  completely  avoids  the  thresholding  losses 
incurred  by  the  traditional  methods  of  generating  point  measurements  by  peak  picking,  three- 
point  interpolation,  etc.  The  theoretical  development  of  the  histogram-PMHT  algorithm 
presented  in  this  report  has  a  mathematically  sound  foundation  based  on  the  framework  of 
PMHT. 

The  negative  multinomial  density  is  potentially  very  useful  in  applications  in  which  data 
compression  and  thresholding  procedures  result  in  truncated  sensor  measurements  for  some 
sensor  cells.  In  other  applications,  limited  resource  availability  may  require  collecting  mea¬ 
surements  in  only  a  subset  of  cells.  The  negative  multinomial  model  compensates  for  missing 
measured  data  by  using  an  expectation  operator  to  extrapolate  the  given  data  into  trun¬ 
cated  cells;  hence,  it  may  reduce  parameter  estimation  bias  and  other  undesirable  edge  effects 
induced  by  cell  truncation. 
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